Search CORE

8 research outputs found

DNN Transfer Learning based Non-linear Feature Extraction for Acoustic Event Classification

Author: Han David K.
Kim Wooil
Ko Hanseok
Mun Seongkyu
Shin Minkyu
Shon Suwon
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2017
Field of study

Recent acoustic event classification research has focused on training suitable filters to represent acoustic events. However, due to limited availability of target event databases and linearity of conventional filters, there is still room for improving performance. By exploiting the non-linear modeling of deep neural networks (DNNs) and their ability to learn beyond pre-trained environments, this letter proposes a DNN-based feature extraction scheme for the classification of acoustic events. The effectiveness and robustness to noise of the proposed method are demonstrated using a database of indoor surveillance environments

arXiv.org e-Print Archive

Crossref

Into-TTS : Intonation Template based Prosody Control System

Author: Choi Heejin
Kim Chanwoo
Lee Jihwan
Lee Joun Yeop
Mun Seongkyu
Park Sangjun
Publication venue
Publication date: 04/04/2022
Field of study

Intonations take an important role in delivering the intention of the speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to the acoustic model training, speech data are automatically grouped into intonation templates by k-means clustering, according to their sentence-final F0 contour. Two proposed modules are added to the end-to-end TTS framework: intonation classifier and intonation encoder. The intonation classifier recommends a suitable intonation template to the given text. The intonation encoder, attached to the text encoder output, synthesizes speech abiding the requested intonation template. Main contributions of our paper are: (a) an easy-to-use intonation control system covering a wide range of users; (b) better performance in wrapping speech in a requested intonation with improved pitch distance and MOS; and (c) feasibility to future integration between TTS and NLP, TTS being able to utilize contextual information. Audio samples are available at https://srtts.github.io/IntoTTS.Comment: Submitted to INTERSPEECH 202

arXiv.org e-Print Archive

An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

Author: Bae Jae-Sung
Cho Hoon-Young
Choi Heejin
Kim Chanwoo
Lee Jihwan
Lee Joun Yeop
Mun Seongkyu
Publication venue
Publication date: 06/11/2022
Field of study

With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply the vowel space analysis method to explore L2 accents of cross-lingual TTS systems. Through the vowel space analysis, we observe the three followings: a) a parallel architecture (Glow-TTS) is less L2-accented than an auto-regressive one (Tacotron); b) L2 accents are more dominant in non-shared vowels in a language pair; and c) L2 accents of cross-lingual TTS systems share some phenomena with those of human L2 learners. Our findings imply that it is necessary for TTS systems to handle each language pair differently, depending on their linguistic characteristics such as non-shared vowels. They also hint that we can further incorporate linguistics knowledge in developing cross-lingual TTS systems.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

New Generalized Sidelobe Canceller with Denoising Auto-Encoder for Improved Speech Enhancement

Author: David K. HAN
Hanseok KO
Minkyu SHIN
Seongkyu MUN
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2017
Field of study

Crossref

A Novel Discriminative Feature Extraction for Acoustic Scene Classification Using RNN Based Source Separation

Author: David K. HAN
Hanseok KO
Seongkyu MUN
Suwon SHON
Wooil KIM
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2017
Field of study

Crossref